31 research outputs found

    Phylogenetic inference under varying proportions of indel-induced alignment gaps

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The effect of alignment gaps on phylogenetic accuracy has been the subject of numerous studies. In this study, we investigated the relationship between the total number of gapped sites and phylogenetic accuracy, when the gaps were introduced (by means of computer simulation) to reflect indel (insertion/deletion) events during the evolution of DNA sequences. The resulting (true) alignments were subjected to commonly used gap treatment and phylogenetic inference methods.</p> <p>Results</p> <p>(1) In general, there was a strong – almost deterministic – relationship between the amount of gap in the data and the level of phylogenetic accuracy when the alignments were very "gappy", (2) gaps resulting from deletions (as opposed to insertions) contributed more to the inaccuracy of phylogenetic inference, (3) the probabilistic methods (Bayesian, PhyML & "ML<it>Ξ΅</it>, " a method implemented in DNAML in PHYLIP) performed better at most levels of gap percentage when compared to parsimony (MP) and distance (NJ) methods, with Bayesian analysis being clearly the best, (4) methods that treat gapped sites as missing data yielded less accurate trees when compared to those that attribute phylogenetic signal to the gapped sites (by coding them as binary character data – presence/absence, or as in the ML<it>Ξ΅ </it>method), and (5) in general, the accuracy of phylogenetic inference depended upon the amount of available data when the gaps resulted from mainly deletion events, and the amount of missing data when insertion events were equally likely to have caused the alignment gaps.</p> <p>Conclusion</p> <p>When gaps in an alignment are a consequence of indel events in the evolution of the sequences, the accuracy of phylogenetic analysis is likely to improve if: (1) alignment gaps are categorized as arising from insertion events or deletion events and then treated separately in the analysis, (2) the evolutionary signal provided by indels is harnessed in the phylogenetic analysis, and (3) methods that utilize the phylogenetic signal in indels are developed for distance methods too. When the true homology is known and the amount of gaps is 20 percent of the alignment length or less, the methods used in this study are likely to yield trees with 90–100 percent accuracy.</p

    Tubulin evolution in insects: gene duplication and subfunctionalization provide specialized isoforms in a functionally constrained gene family

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The completion of 19 insect genome sequencing projects spanning six insect orders provides the opportunity to investigate the evolution of important gene families, here tubulins. Tubulins are a family of eukaryotic structural genes that form microtubules, fundamental components of the cytoskeleton that mediate cell division, shape, motility, and intracellular trafficking. Previous <it>in vivo </it>studies in <it>Drosophila </it>find a stringent relationship between tubulin structure and function; small, biochemically similar changes in the major alpha 1 or testis-specific beta 2 tubulin protein render each unable to generate a motile spermtail axoneme. This has evolutionary implications, not a single non-synonymous substitution is found in <it>beta 2 </it>among 17 species of <it>Drosophila </it>and <it>Hirtodrosophila </it>flies spanning 60 Myr of evolution. This raises an important question, How do tubulins evolve while maintaining their function? To answer, we use molecular evolutionary analyses to characterize the evolution of insect tubulins.</p> <p>Results</p> <p>Sixty-six alpha tubulins and eighty-six beta tubulin gene copies were retrieved and subjected to molecular evolutionary analyses. Four ancient clades of alpha and beta tubulins are found in insects, a major isoform clade (<it>alpha 1, beta 1</it>) and three minor, tissue-specific clades (<it>alpha 2-4, beta 2-4</it>). Based on a <it>Homarus americanus </it>(lobster) outgroup, these were generated through gene duplication events on major beta and alpha tubulin ancestors, followed by subfunctionalization in expression domain. Strong purifying selection acts on all tubulins, yet maximum pairwise amino acid distances between tubulin paralogs are large (0.464 substitutions/site beta tubulins, 0.707 alpha tubulins). Conversely orthologs, with the exception of reproductive tissue isoforms, show little sequence variation except in the last 15 carboxy terminus tail (CTT) residues, which serve as sites for post-translational modifications (PTMs) and interactions with microtubule-associated proteins. CTT residues overwhelming comprise the co-evolving residues between <it>Drosophila </it>alpha 2 and beta 3 tubulin proteins, indicating CTT specializations can be mediated at the level of the tubulin dimer. Gene duplications post-dating separation of the insect orders are unevenly distributed, most often appearing in major <it>alpha 1 </it>and minor <it>beta 2 </it>clades. More than 40 introns are found in tubulins. Their distribution among tubulins reveals that insertion and deletion events are common, surprising given their potential for disrupting tubulin coding sequence. Compensatory evolution is found in <it>Drosophila beta 2 </it>tubulin cis-regulation, and reveals selective pressures acting to maintain testis expression without the use of previously identified testis cis-regulatory elements.</p> <p>Conclusion</p> <p>Tubulins have stringent structure/function relationships, indicated by strong purifying selection, the loss of many gene duplication products, alpha-beta co-evolution in the tubulin dimer, and compensatory evolution in <it>beta 2 </it>tubulin cis-regulation. They evolve through gene duplication, subfunctionalization in expression domain and divergence of duplication products, largely in CTT residues that mediate interactions with other proteins. This has resulted in the tissue-specific minor insect isoforms, and in particular the highly diverse <it>Ξ±3</it>, <it>Ξ±4</it>, and <it>Ξ²2 </it>reproductive tissue-specific tubulin isoforms, illustrating that even a highly conserved protein family can participate in the adaptive process and respond to sexual selection.</p

    Determining the Statistical Significance of Observed Frequencies of Short DNA Motifs in a Genome

    Get PDF
    Until recently over 90 percent of the DNA in the human genome was considered junk DNA, with no known function. However, this non-coding DNA is now known to harbor elements that perform important functions in gene regulation. In particular, there is currently much interest in the search for short DNA motifs collectively known as cis-regulatory elements. Most studies attempt to identify these elements by means of cross-species comparisons. We have approached the problem of finding cis-regulatory elements by searching for conserved DNA motifs within genomes. This requires searching for DNA motifs that are repeated in the genomes either more or less frequently than expected by random chance. However, the usual chi-squared test cannot be used to test for the statistical significance of any observed frequency since overlapping regions of the genome are checked for DNA motif matches. We present here a statistical measure that has been developed to quantify the expectation and variance of the frequency of a given DNA motif in a given target sequence that may contain overlapping regions

    Determining the Statistical Significance of Observed Frequencies of Short DNA Motifs in a Genome

    Get PDF
    Until recently over 90 percent of the DNA in the human genome was considered junk DNA, with no known function. However, this non-coding DNA is now known to harbor elements that perform important functions in gene regulation. In particular, there is currently much interest in the search for short DNA motifs collectively known as cis-regulatory elements. Most studies attempt to identify these elements by means of cross-species comparisons. We have approached the problem of finding cis-regulatory elements by searching for conserved DNA motifs within genomes. This requires searching for DNA motifs that are repeated in the genomes either more or less frequently than expected by random chance. However, the usual chi-squared test cannot be used to test for the statistical significance of any observed frequency since overlapping regions of the genome are checked for DNA motif matches. We present here a statistical measure that has been developed to quantify the expectation and variance of the frequency of a given DNA motif in a given target sequence that may contain overlapping regions

    Invasive Predators Deplete Genetic Diversity of Island Lizards

    Get PDF
    Invasive species can dramatically impact natural populations, especially those living on islands. Though numerous examples illustrate the ecological impact of invasive predators, no study has examined the genetic consequences for native populations subject to invasion. Here we capitalize on a natural experiment in which a long-term study of the brown anole lizard (Anolis sagrei) was interrupted by rat invasion. An island population that was devastated by rats recovered numerically following rat extermination. However, population genetic analyses at six microsatellite loci suggested a possible loss of genetic diversity due to invasion when compared to an uninvaded island studied over the same time frame. Our results provide partial support for the hypothesis that invasive predators can impact the genetic diversity of resident island populations

    Protein Function Assignment through Mining Cross-Species Protein-Protein Interactions

    Get PDF
    Background: As we move into the post genome-sequencing era, an immediate challenge is how to make best use of the large amount of high-throughput experimental data to assign functions to currently uncharacterized proteins. We here describe CSIDOP, a new method for protein function assignment based on shared interacting domain patterns extracted from cross-species protein-protein interaction data. Methodology/Principal Findings: The proposed method is assessed both biologically and statistically over the genome of H. sapiens. The CSIDOP method is capable of making protein function prediction with accuracy of 95.42 % using 2,972 gene ontology (GO) functional categories. In addition, we are able to assign novel functional annotations for 181 previously uncharacterized proteins in H. sapiens. Furthermore, we demonstrate that for proteins that are characterized by GO, the CSIDOP may predict extra functions. This is attractive as a protein normally executes a variety of functions in different processes and its current GO annotation may be incomplete. Conclusions/Significance: It can be shown through experimental results that the CSIDOP method is reliable and practical in use. The method will continue to improve as more high quality interaction data becomes available and is readily scalable t

    miRNAs in Newt Lens Regeneration: Specific Control of Proliferation and Evidence for miRNA Networking

    Get PDF
    Background: Lens regeneration in adult newts occurs via transdifferentiation of the pigment epithelial cells (PECs) of the dorsal iris. The same source of cells from the ventral iris is not able to undergo this process. In an attempt to understand this restriction we have studied in the past expression patterns of miRNAs. Among several miRNAs we have found that mir-148 shows an up-regulation in the ventral iris, while members of the let-7 family showed down-regulation in dorsal iris during dedifferentiation. Methodology/Principal Findings: We have performed gain- and loss-of–function experiments of mir-148 and let-7b in an attempt to delineate their function. We find that up-regulation of mir-148 caused significant decrease in the proliferation rates of ventral PECs only, while up-regulation of let-7b affected proliferation of both dorsal and ventral PECs. Neither miRNA was able to affect lens morphogenesis or induction. To further understand how this effect of miRNA up-regulation is mediated we examined global expression of miRNAs after up-regulation of mir148 and let-7b. Interestingly, we identified a novel level of mirRNA regulation, which might indicate that miRNAs are regulated as a network. Conclusion/Significance: The major conclusion is that different miRNAs can control proliferation in the dorsal or ventral iris possibly by a different mechanism. Of interest is that down-regulation of the let-7 family members has also been documented in other systems undergoing reprogramming, such as in stem cells or oocytes. This might indicate tha

    Genetic Variations and Haplotype Diversity of the UGT1 Gene Cluster in the Chinese Population

    Get PDF
    Vertebrates require tremendous molecular diversity to defend against numerous small hydrophobic chemicals. UDP-glucuronosyltransferases (UGTs) are a large family of detoxification enzymes that glucuronidate xenobiotics and endobiotics, facilitating their excretion from the body. The UGT1 gene cluster contains a tandem array of variable first exons, each preceded by a specific promoter, and a common set of downstream constant exons, similar to the genomic organization of the protocadherin (Pcdh), immunoglobulin, and T-cell receptor gene clusters. To assist pharmacogenomics studies in Chinese, we sequenced nine first exons, promoter and intronic regions, and five common exons of the UGT1 gene cluster in a population sample of 253 unrelated Chinese individuals. We identified 101 polymorphisms and found 15 novel SNPs. We then computed allele frequencies for each polymorphism and reconstructed their linkage disequilibrium (LD) map. The UGT1 cluster can be divided into five linkage blocks: Block 9 (UGT1A9), Block 9/7/6 (UGT1A9, UGT1A7, and UGT1A6), Block 5 (UGT1A5), Block 4/3 (UGT1A4 and UGT1A3), and Block 3β€² UTR. Furthermore, we inferred haplotypes and selected their tagSNPs. Finally, comparing our data with those of three other populations of the HapMap project revealed ethnic specificity of the UGT1 genetic diversity in Chinese. These findings have important implications for future molecular genetic studies of the UGT1 gene cluster as well as for personalized medical therapies in Chinese

    Complete Mitochondrial Genome Sequence of Three Tetrahymena Species Reveals Mutation Hot Spots and Accelerated Nonsynonymous Substitutions in Ymf Genes

    Get PDF
    The ciliate Tetrahymena, a model organism, contains divergent mitochondrial (Mt) genome with unusual properties, where half of its 44 genes still remain without a definitive function. These genes could be categorized into two major groups of KPC (known protein coding) and Ymf (genes without an identified function). To gain insights into the mechanisms underlying gene divergence and molecular evolution of Tetrahymena (T.) Mt genomes, we sequenced three Mt genomes of T.paravorax, T.pigmentosa, and T.malaccensis. These genomes were aligned and the analyses were carried out using several programs that calculate distance, nucleotide substitution (dn/ds), and their rate ratios (Ο‰) on individual codon sites and via a sliding window approach. Comparative genomic analysis indicated a conserved putative transcription control sequence, a GC box, in a region where presumably transcription and replication initiate. We also found distinct features in Mt genome of T.paravorax despite similar genome organization among these ∼47 kb long linear genomes. Another significant finding was the presence of at least one or more highly variable regions in Ymf genes where majority of substitutions were concentrated. These regions were mutation hotspots where elevated distances and the dn/ds ratios were primarily due to an increase in the number of nonsynonymous substitutions, suggesting relaxed selective constraint. However, in a few Ymf genes, accelerated rates of nonsynonymous substitutions may be due to positive selection. Similarly, on protein level the majority of amino acid replacements occurred in these regions. Ymf genes comprise half of the genes in Tetrahymena Mt genomes, so understanding why they have not been assigned definitive functions is an important aspect of molecular evolution. Importantly, nucleotide substitution types and rates suggest possible reasons for not being able to find homologues for Ymf genes. Additionally, comparative genomic analysis of complete Mt genomes is essential in identifying biologically significant motifs such as control regions

    PhyloM: A Computer Program for Phylogenetic Inference from Measurement or Binary Data, with Bootstrapping

    No full text
    Quantitative and binary results are ubiquitous in biology. Inasmuch as an underlying genetic basis for the observed variation in these observations can be assumed, it is pertinent to infer the evolutionary relationships among the entities being measured. I present a computer program, PhyloM, that takes measurement data or binary data as input, using which, it directly generates a pairwise distance matrix that can then be subjected to the popular neighbor-joining (NJ) algorithm to produce a phylogenetic tree. PhyloM also has the option of nonparametric bootstrapping for testing the level of support for the inferred phylogeny. Finally, PhyloM also allows the user to root the tree on any desired branch. PhyloM was tested on Biolog Gen III growth data from isolates within the genus Chromobacterium and the closely related Aquitalea sp. This allowed a comparison with the genotypic tree inferred from whole-genome sequences for the same set of isolates. From this comparison, it was possible to infer parallel evolution. PhyloM is a stand-alone and easy-to-use computer program with a user-friendly graphical user interface that computes pairwise distances from measurement or binary data, which can then be used to infer phylogeny using NJ using a utility in the same program. Alternatively, the distance matrix can be downloaded for use in another program for phylogenetic inference or other purposes. It does not require any software to be installed or computer code written and is open source. The executable and computer code are available on GitHub
    corecore